Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

نویسندگان

Benjamin Eysenbach

Shixiang Gu

Julian Ibarz

Sergey Levine

چکیده

Deep reinforcement learning algorithms can learn complex behavioral skills, but real-world application of these methods requires a large amount of experience to be collected by the agent. In practical settings, such as robotics, this involves repeatedly attempting a task, resetting the environment between each attempt. However, not all tasks are easily or automatically reversible. In practice, this learning process requires extensive human intervention. In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. By learning a value function for the reset policy, we can automatically determine when the forward policy is about to enter a non-reversible state, providing for uncertainty-aware safety aborts. Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning Based PID Control of Wind Energy Conversion Systems

In this paper an adaptive PID controller for Wind Energy Conversion Systems (WECS) has been developed. Theadaptation technique applied to this controller is based on Reinforcement Learning (RL) theory. Nonlinearcharacteristics of wind variations as plant input, wind turbine structure and generator operational behaviordemand for high quality adaptive controller to ensure both robust stability an...

متن کامل

A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem

Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...

متن کامل

C-trace: a New Algorithm for Reinforcement Learning of Robotic Control

There has been much recent interest in the potential of using reinforcement learning techniques for control in autonomous robotic agents. How to implement eeective reinforcement learning in a real-world robotic environment still involves many open questions. Are standard reinforcement learning algorithms like Watkins' Q-learning appropriate , or are other approaches more suit-able? Some speciic...

متن کامل

Machine Learning Models to Enhance the Science of Cognitive Autonomy

Intelligent Autonomous Systems (IAS) are highly cognitive, reflective, multitask-able, and effective in knowledge discovery. Examples of IAS include software systems that are capable of automatic reconfiguration, autonomous vehicles, network of sensors with reconfigurable sensory platforms, and an unmanned aerial vehicle (UAV) respecting privacy by deciding to turn off its camera when pointing ...

متن کامل

Safe Reinforcement Learning via Formal Methods Toward Safe Control Through Proof and Learning

Formal verification provides a high degree of confidence in safe system operation, but only if reality matches the verified model. Although a good model will be accurate most of the time, even the best models are incomplete. This is especially true in Cyber-Physical Systems because high-fidelity physical models of systems are expensive to develop and often intractable to verify. Conversely, rei...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1711.06782 شماره

صفحات -

تاریخ انتشار 2017

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Reinforcement Learning Based PID Control of Wind Energy Conversion Systems

A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem

C-trace: a New Algorithm for Reinforcement Learning of Robotic Control

Machine Learning Models to Enhance the Science of Cognitive Autonomy

Safe Reinforcement Learning via Formal Methods Toward Safe Control Through Proof and Learning

عنوان ژورنال:

اشتراک گذاری